Score distributions for Pseudo Relevance Feedback

نویسندگان

  • Javier Parapar
  • Manuel A. Presedo Quindimil
  • Alvaro Barreiro
چکیده

Relevance-Based Language Models, commonly known as Relevance Models, are successful approaches to explicitly introduce the concept of relevance in the statistical Language Modelling framework of Information Retrieval. These models achieve state-of-the-art retrieval performance in the Pseudo Relevance Feedback task. It is known that one of the factors that more affect to the Pseudo Relevance Feedback robustness is the selection for some queries of harmful expansion terms. In order to minimise this effect in these methods a crucial point is to reduce the number of non-relevant documents in the pseudo relevant set. In this paper, we propose an original approach to tackle this problem. We try to automatically determine for each query how many documents we should select as pseudo-relevant set. For achieving this objective we will study the score distributions of the initial retrieval and trying to discern in base of their distribution between relevant and non-relevant documents. Evaluation of our proposal showed important improvements in terms of robustness.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Query Expansion Strategy based on Pseudo Relevance Feedback and Term Weight Scheme for Monolingual Retrieval

Query Expansion using Pseudo Relevance Feedback is a useful technique for reformulating the query. In this paper, expansion terms are obtained by combining pseudo relevance feedback and equi-frequency partition of the documents with tf-idf scoring technique. It is observed that the groups of words that have same tf-idf score as that of query terms are better candidate words for query expansion ...

متن کامل

DUTIR at TREC 2011 Microblog Track

In TREC 2011 Microblog Track, we explore the use of pseudo relevance feedback to expand original query terms in topics. Hyperlink is used to enhance the performance of the retrieval results. And we set a threshold of entropy to filter retrieval results. Microblog is a Realtime Adhoc Task, so we make use of average querytweettime that comes from pseudo relevance feedback to change retrieval scor...

متن کامل

Modèles probabilistes pour les fréquences de mots et la recherche d'information. (Probabilistic Models of Document Collections)

The present study deals with word frequencies distributions and their relation to probabilistic Information Retrieval (IR) models. We examine the burstiness phenomenon (a rich get richer phenomenon) of word frequencies in textual collections. We propose to model this phenomenon as a property of probability distributions and we show that the Beta Negative Binomial distribution is a good statisti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Sci.

دوره 273  شماره 

صفحات  -

تاریخ انتشار 2014